Read Dataset

In [1]:
import numpy as np
import pandas as pd 
A = pd.read_csv("FashionDataset.csv")
A
Out [1]:
Unnamed: 0 BrandName Deatils Sizes MRP SellPrice Discount Category
0 0 life solid cotton blend collar neck womens a-line d... Size:Large,Medium,Small,X-Large,X-Small Rs\n1699 849 50% off Westernwear-Women
1 1 only polyester peter pan collar womens blouson dres... Size:34,36,38,40 Rs\n3499 2449 30% off Westernwear-Women
2 2 fratini solid polyester blend wide neck womens regular... Size:Large,X-Large,XX-Large Rs\n1199 599 50% off Westernwear-Women
3 3 zink london stripes polyester sweetheart neck womens dress... Size:Large,Medium,Small,X-Large Rs\n2299 1379 40% off Westernwear-Women
4 4 life regular fit regular length denim womens jeans ... Size:26,28,30,32,34,36 Rs\n1699 849 50% off Westernwear-Women
... ... ... ... ... ... ... ... ...
30753 21 swarovski crystal stylish womens rodhium earrings Nan Nan 8950 Nan Jewellery-Women
30754 22 Nan Nan Nan Nan Nan Nan Jewellery-Women
30755 23 jewelz ethnic gold plated jhumki earrings Nan Rs\n1839 643 65% off Jewellery-Women
30756 24 estelle womens gold plated double line fancy white and... Nan Nan 2799 Nan Jewellery-Women
30757 25 estelle womens gold plated bridge designer mangalsutra... Nan Nan 1899 Nan Jewellery-Women

30758 rows × 8 columns

In [2]:
A.info()
<class 'pandas.core.frame.DataFrame'>

RangeIndex: 30758 entries, 0 to 30757

Data columns (total 8 columns):

 #   Column      Non-Null Count  Dtype 

---  ------      --------------  ----- 

 0   Unnamed: 0  30758 non-null  int64 

 1   BrandName   30758 non-null  object

 2   Deatils     30758 non-null  object

 3   Sizes       30758 non-null  object

 4   MRP         30758 non-null  object

 5   SellPrice   30758 non-null  object

 6   Discount    30758 non-null  object

 7   Category    30758 non-null  object

dtypes: int64(1), object(7)

memory usage: 1.9+ MB

Data preparation

In [3]:
A.isna().sum()
A= A.drop(labels=["Unnamed: 0"],axis=1)
In [4]:
A.shape
Out [4]:
(30758, 7)
In [5]:
A
Out [5]:
BrandName Deatils Sizes MRP SellPrice Discount Category
0 life solid cotton blend collar neck womens a-line d... Size:Large,Medium,Small,X-Large,X-Small Rs\n1699 849 50% off Westernwear-Women
1 only polyester peter pan collar womens blouson dres... Size:34,36,38,40 Rs\n3499 2449 30% off Westernwear-Women
2 fratini solid polyester blend wide neck womens regular... Size:Large,X-Large,XX-Large Rs\n1199 599 50% off Westernwear-Women
3 zink london stripes polyester sweetheart neck womens dress... Size:Large,Medium,Small,X-Large Rs\n2299 1379 40% off Westernwear-Women
4 life regular fit regular length denim womens jeans ... Size:26,28,30,32,34,36 Rs\n1699 849 50% off Westernwear-Women
... ... ... ... ... ... ... ...
30753 swarovski crystal stylish womens rodhium earrings Nan Nan 8950 Nan Jewellery-Women
30754 Nan Nan Nan Nan Nan Nan Jewellery-Women
30755 jewelz ethnic gold plated jhumki earrings Nan Rs\n1839 643 65% off Jewellery-Women
30756 estelle womens gold plated double line fancy white and... Nan Nan 2799 Nan Jewellery-Women
30757 estelle womens gold plated bridge designer mangalsutra... Nan Nan 1899 Nan Jewellery-Women

30758 rows × 7 columns

In [6]:
A.nunique()
Out [6]:
BrandName      275
Deatils      23877
Sizes         1172
MRP           1097
SellPrice     2046
Discount        66
Category         7
dtype: int64

Replacing NaN by NULL and then we drop from the particular rows

In [7]:
A.replace("Nan",np.nan,inplace=True)
In [8]:
A.dropna(axis=0,inplace=True)
In [9]:
A
Out [9]:
BrandName Deatils Sizes MRP SellPrice Discount Category
0 life solid cotton blend collar neck womens a-line d... Size:Large,Medium,Small,X-Large,X-Small Rs\n1699 849 50% off Westernwear-Women
1 only polyester peter pan collar womens blouson dres... Size:34,36,38,40 Rs\n3499 2449 30% off Westernwear-Women
2 fratini solid polyester blend wide neck womens regular... Size:Large,X-Large,XX-Large Rs\n1199 599 50% off Westernwear-Women
3 zink london stripes polyester sweetheart neck womens dress... Size:Large,Medium,Small,X-Large Rs\n2299 1379 40% off Westernwear-Women
4 life regular fit regular length denim womens jeans ... Size:26,28,30,32,34,36 Rs\n1699 849 50% off Westernwear-Women
... ... ... ... ... ... ... ...
26673 lemon & pepper womens casual wear buckle closure flats - navy Size:36,37,38,39,40 Rs\n2999 1499 50% off Footwear-Women
26674 haute curry womens casual wear slip on heels - black Size:36,37,38,39,40 Rs\n2199 1099 50% off Footwear-Women
26885 swiss eagle womens analogue metallic watch Size:Error Size Rs\n13990 4197 70% off Watches-Women
27290 lawman watches womens rose gold dial stainless steel analogue... Size:Error Size Rs\n7499 4999 33% off Watches-Women
28418 lawman watches womens silver dial stainless steel analogue wa... Size:Error Size Rs\n5999 3999 33% off Watches-Women

18374 rows × 7 columns

As in MRP column values are in integer but due to RS it is in string ,so i have converted in integer.

In [10]:
Q = []
from re import sub
for i in A.MRP:
    Q.append(int(sub("[Rs\n]","",i)))
In [11]:
A.MRP = Q
In [12]:
S = A.SellPrice 
X = pd.to_numeric(S)
In [13]:
A.SellPrice = X

EDA

In [14]:
A.info()
<class 'pandas.core.frame.DataFrame'>

Int64Index: 18374 entries, 0 to 28418

Data columns (total 7 columns):

 #   Column     Non-Null Count  Dtype 

---  ------     --------------  ----- 

 0   BrandName  18374 non-null  object

 1   Deatils    18374 non-null  object

 2   Sizes      18374 non-null  object

 3   MRP        18374 non-null  int64 

 4   SellPrice  18374 non-null  int64 

 5   Discount   18374 non-null  object

 6   Category   18374 non-null  object

dtypes: int64(2), object(5)

memory usage: 1.1+ MB
In [15]:
A.head()
Out [15]:
BrandName Deatils Sizes MRP SellPrice Discount Category
0 life solid cotton blend collar neck womens a-line d... Size:Large,Medium,Small,X-Large,X-Small 1699 849 50% off Westernwear-Women
1 only polyester peter pan collar womens blouson dres... Size:34,36,38,40 3499 2449 30% off Westernwear-Women
2 fratini solid polyester blend wide neck womens regular... Size:Large,X-Large,XX-Large 1199 599 50% off Westernwear-Women
3 zink london stripes polyester sweetheart neck womens dress... Size:Large,Medium,Small,X-Large 2299 1379 40% off Westernwear-Women
4 life regular fit regular length denim womens jeans ... Size:26,28,30,32,34,36 1699 849 50% off Westernwear-Women

Univariate analysis of continous columns

In [16]:
import seaborn as sb
sb.distplot(A.MRP)
Out [16]:
<AxesSubplot:xlabel='MRP', ylabel='Density'>
In [17]:
import seaborn as sb
sb.distplot(A.SellPrice)
Out [17]:
<AxesSubplot:xlabel='SellPrice', ylabel='Density'>
In [18]:
A.describe()
Out [18]:
MRP SellPrice
count 18374.000000 18374.000000
mean 2136.928704 1163.798846
std 1189.416850 744.201506
min 171.000000 114.000000
25% 1299.000000 659.000000
50% 1899.000000 995.000000
75% 2663.000000 1469.000000
max 16999.000000 13599.000000

Plotting the Bar and Charts by using Matlab

In [19]:
import matplotlib.pyplot as plt 
import seaborn as sb
In [20]:
plt.figure(figsize=(15,10))
plt.subplot(1,1,1)
sb.countplot(A.Category)


Out [20]:
<AxesSubplot:xlabel='Category', ylabel='count'>
In [21]:
A.BrandName.value_counts().head(25).plot(kind="pie")
Out [21]:
<AxesSubplot:ylabel='BrandName'>
In [22]:
A.Sizes.value_counts().head(30).plot(kind="bar")
Out [22]:
<AxesSubplot:>
In [23]:
sb.scatterplot(A.SellPrice,A.MRP,hue=A.Category)
plt.xlabel("SellPrice")
plt.ylabel("MRP")
plt.xticks(range(0,14000,3000))
Out [23]:
([<matplotlib.axis.XTick at 0x1b9808f1160>,
  <matplotlib.axis.XTick at 0x1b9808f1130>,
  <matplotlib.axis.XTick at 0x1b98091aa00>,
  <matplotlib.axis.XTick at 0x1b98092c7c0>,
  <matplotlib.axis.XTick at 0x1b98092c760>],
 [Text(0, 0, ''),
  Text(0, 0, ''),
  Text(0, 0, ''),
  Text(0, 0, ''),
  Text(0, 0, '')])
In [24]:
plt.figure(figsize=(20,10))
sb.boxplot(A.Category,A.SellPrice)
#Comparison between the features to know about sellprice with different different Category 
Out [24]:
<AxesSubplot:xlabel='Category', ylabel='SellPrice'>
In [25]:
plt.figure(figsize=(20,10))
sb.boxplot(A.Category,A.MRP)
#Comparison between the features to know about MRP with different different Category
Out [25]:
<AxesSubplot:xlabel='Category', ylabel='MRP'>
In [26]:
sb.heatmap(A.corr())
Out [26]:
<AxesSubplot:>
In [ ]: